80 research outputs found
Interactive Text2Pickup Network for Natural Language based Human-Robot Collaboration
In this paper, we propose the Interactive Text2Pickup (IT2P) network for
human-robot collaboration which enables an effective interaction with a human
user despite the ambiguity in user's commands. We focus on the task where a
robot is expected to pick up an object instructed by a human, and to interact
with the human when the given instruction is vague. The proposed network
understands the command from the human user and estimates the position of the
desired object first. To handle the inherent ambiguity in human language
commands, a suitable question which can resolve the ambiguity is generated. The
user's answer to the question is combined with the initial command and given
back to the network, resulting in more accurate estimation. The experiment
results show that given unambiguous commands, the proposed method can estimate
the position of the requested object with an accuracy of 98.49% based on our
test dataset. Given ambiguous language commands, we show that the accuracy of
the pick up task increases by 1.94 times after incorporating the information
obtained from the interaction.Comment: 8 pages, 9 figure
Text2Action: Generative Adversarial Synthesis from Language to Action
In this paper, we propose a generative model which learns the relationship
between language and human action in order to generate a human action sequence
given a sentence describing human behavior. The proposed generative model is a
generative adversarial network (GAN), which is based on the sequence to
sequence (SEQ2SEQ) model. Using the proposed generative network, we can
synthesize various actions for a robot or a virtual agent using a text encoder
recurrent neural network (RNN) and an action decoder RNN. The proposed
generative network is trained from 29,770 pairs of actions and sentence
annotations extracted from MSR-Video-to-Text (MSR-VTT), a large-scale video
dataset. We demonstrate that the network can generate human-like actions which
can be transferred to a Baxter robot, such that the robot performs an action
based on a provided sentence. Results show that the proposed generative network
correctly models the relationship between language and action and can generate
a diverse set of actions from the same sentence.Comment: 8 pages, 10 figure
Deep Virtual Networks for Memory Efficient Inference of Multiple Tasks
Deep networks consume a large amount of memory by their nature. A natural
question arises can we reduce that memory requirement whilst maintaining
performance. In particular, in this work we address the problem of memory
efficient learning for multiple tasks. To this end, we propose a novel network
architecture producing multiple networks of different configurations, termed
deep virtual networks (DVNs), for different tasks. Each DVN is specialized for
a single task and structured hierarchically. The hierarchical structure, which
contains multiple levels of hierarchy corresponding to different numbers of
parameters, enables multiple inference for different memory budgets. The
building block of a deep virtual network is based on a disjoint collection of
parameters of a network, which we call a unit. The lowest level of hierarchy in
a deep virtual network is a unit, and higher levels of hierarchy contain lower
levels' units and other additional units. Given a budget on the number of
parameters, a different level of a deep virtual network can be chosen to
perform the task. A unit can be shared by different DVNs, allowing multiple
DVNs in a single network. In addition, shared units provide assistance to the
target task with additional knowledge learned from another tasks. This
cooperative configuration of DVNs makes it possible to handle different tasks
in a memory-aware manner. Our experiments show that the proposed method
outperforms existing approaches for multiple tasks. Notably, ours is more
efficient than others as it allows memory-aware inference for all tasks.Comment: CVPR 201
Deep Elastic Networks with Model Selection for Multi-Task Learning
In this work, we consider the problem of instance-wise dynamic network model
selection for multi-task learning. To this end, we propose an efficient
approach to exploit a compact but accurate model in a backbone architecture for
each instance of all tasks. The proposed method consists of an estimator and a
selector. The estimator is based on a backbone architecture and structured
hierarchically. It can produce multiple different network models of different
configurations in a hierarchical structure. The selector chooses a model
dynamically from a pool of candidate models given an input instance. The
selector is a relatively small-size network consisting of a few layers, which
estimates a probability distribution over the candidate models when an input
instance of a task is given. Both estimator and selector are jointly trained in
a unified learning framework in conjunction with a sampling-based learning
strategy, without additional computation steps. We demonstrate the proposed
approach for several image classification tasks compared to existing approaches
performing model selection or learning multiple tasks. Experimental results
show that our approach gives not only outstanding performance compared to other
competitors but also the versatility to perform instance-wise model selection
for multiple tasks.Comment: ICCV 201
- โฆ